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Abstract  —  For  reliable  and  sustainable  decision  mak¬ 
ing,  it  is  essential  to  perform  intelligent  sensing  and 
data  collection  at  scalable  network  resources  costs.  The 
sensor  platforms  used  in  a  warfare  may  be  under  at¬ 
tacks  from  adversarial  forces,  which  will  largely  impact 
the  overall  performance  of  surveillance  systems.  Thus, 
it  is  crucial  that  each  intelligent  sensor  have  the  capa¬ 
bility  of  detecting  and  avoiding  possible  attacks.  In  this 
paper,  we  study  an  attack-avoidance  problem  under  the 
framework  of  a  LQ  game  formulation.  This  is  a  first 
attempt  to  solve  such  kind  of  problems.  From  a  prac¬ 
tical  point  of  view,  the  inherent  hard  constraints  have 
been  approximated  and  replaced  by  soft  constraints  with 
a  fixed  optimization  horizon.  For  implementation,  a 
receding  horizon  scheme  has  been  used  in  junction  with 
the  LQ  strategies.  Overall,  the  LQ  strategies  can  pro¬ 
vide  good  control  guidance  laws  for  the  players. 

Keywords:  Tracking,  game,  linear  quadratic,  equilib¬ 
rium. 

1  Introduction 

In  modern  military  operations,  it  is  desired  to  have 
heterogeneous  sensor  platforms  and  distributed  warfare 
assets,  which  are  strategically  responsive,  sustainable 
and  survivable,  and  provide  surveillance  and  situation 
awareness.  It  is  therefore  essential  to  perform  intelli¬ 
gent  sensing  and  data  collection  for  reliable  and  sustain¬ 
able  decision  making,  at  scalable  costs  to  the  network 
resources.  To  date,  recent  work  seeks  to  reduce  sensor 
management  noise,  communication  overhead,  computa¬ 
tion  complexity  and  scalability  [1,  2],  However,  the  sen¬ 
sor  platforms  used  in  a  warfare  may  be  under  attacks 
from  adversarial  forces,  which  will  largely  impact  the 
overall  performance  of  surveillance  systems.  Thus,  it  is 
crucial  that  each  intelligent  sensor  have  the  capability 
of  detecting  and  avoiding  possible  attacks,  advocating 
for  novel  sensor  fusion  approaches  to  threat  assessment 
(typically  called  Level  3  fusion)  that  account  for  sen¬ 
sor  management  constraints  (typically  called  Level  4 
fusion) . 


To  avoid  the  complexity  involved  in  networked  sen¬ 
sors,  as  a  first  attempt,  we  study  an  attack-avoidance 
problem  with  only  one  sensor.  The  problem  involves 
four  entities:  sensor,  environment,  target  and  attacker. 
Here,  the  sensor  tracks  a  target  in  a  given  environment, 
which  may  be  stationary  or  moves  along  its  predeter¬ 
mined  trajectory.  An  attacker  wants  to  collide  with 
the  sensor  to  destroy  it,  while  the  sensor  tries  to  avoid. 
This  type  of  attacker-avoidance  problem  is  new.  Al¬ 
though  sharing  similarities  with  conventional  tracking 
and  object  avoidance  problems,  it  certainly  has  more  in¬ 
gredients.  It  involves  both  tracking  and  the  conflict  of 
pursuit  and  evasion.  Control  strategies  designed  purely 
for  tracking  or  object  avoidance  becomes  irrelevant. 

This  attacker-avoidance  problem  also  shares  some 
similarities  with  pursuit-evasion  (PE)  games.  In  a  typi¬ 
cal  PE  game,  two  players  are  present,  i.e.,  a  pursuer  and 
an  evader  1 .  To  study  the  optimal  pursuit  or  evasion 
strategy,  it  is  formulated  as  a  zero-sum  game.  The  pur¬ 
suer  tries  to  minimize  a  prescribed  cost  functional  while 
the  evader  tries  to  maximize  the  same  functional  [3,  4], 
Dynamic  Programming  (DP)  is  the  a  general  method 
for  solving  such  games.  In  the  literature,  a  number  of 
formal  solutions  regarding  optimal  strategies  in  partic¬ 
ular  PE  problems  have  been  achieved  [3,  4,  5,  6].  Due 
to  the  development  of  Linear  Quadratic  (LQ)  optimal 
control  theory,  a  large  portion  of  the  literature  focuses 
on  PE  differential  games  with  a  performance  criterion 
in  a  quadratic  form  and  linear  dynamics  [5,  7}. 

However,  with  an  additional  attacker,  the  existing  re¬ 
sults  on  conventional  PE  games  are  largely  not  applica¬ 
ble  to  the  sensor  attack-avoidance  problem.  Here,  the 
point  of  interest  is  no  longer  pure  pursuit  or  evasion. 
The  refined  strategy  for  the  sensor  is  to  continuously 
track  the  target  while  avoiding  possible  attacks  (within 
certain  period  of  time).  To  our  knowledge,  there  is  little 
direct  literature  on  the  attack-avoidance  problem. 


1  Readers  should  be  aware  that  pursuit-evasion  games  involv¬ 
ing  multiple  pursuers  and  evaders  have  been  studied  in  the  liter¬ 
ature. 
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Missile  guidance  and  navigation  is  another  related  re¬ 
search  area.  In  a  conventional  navigation  problem,  con¬ 
trol  laws  have  been  designed  for  an  interceptor  to  track 
a  moving  target  (with  no  attacker).  The  proportional 
navigation  guidance  law  and  its  variants  have  been  the 
most  widely  employed  techniques  for  non-maneuvering 
targeting  due  to  their  simplicity  and  ease  of  implemen¬ 
tation  [8] .  Another  large  class  of  guidance  laws  relevant 
to  this  problem  are  those  designed  based  on  optimal 
control  theory,  of  which  many  are  applications  of  LQ 
optimal  control  theory  [8]. 

In  this  paper,  we  formulate  the  attack-avoidance 
problem  as  a  zero-sum  game  between  the  sensor  and 
the  attacker.  This  is  a  first  attempt  to  such  a  problem, 
and  to  avoid  theoretical  difficulties,  we  adopt  a  LQ  for¬ 
mulation  to  make  use  of  the  existing  LQ  game  theory. 
In  particular,  as  a  practical  approach,  terminal  penalty 
terms  are  used  as  soft  constraints  in  the  adopted  game 
completion.  With  additional  assumptions  on  the  linear 
dynamics  of  the  players,  LQ  differential  game  theory  is 
applicable.  Furthermore,  a  practical  approach  to  this 
emerging  problem  is  developed  with  a  sequential  imple¬ 
mentation  scheme.  The  performance  of  the  algorithm  is 
demonstrated  through  simulations  which  validates  the 
usefulness  of  the  approach. 

In  the  proposed  LQ  game  approach,  the  key  assump¬ 
tion  and  the  main  limitation  is  that  the  trajectory  of 
the  target  (at  least  in  the  immediate  future)  is  known  to 
both  the  sensor  and  the  attacker,  which  in  many  case, 
is  not  valid  in  sensor  applications,  due  to  the  concern 
that  collecting  information  is  the  main  goal.  However, 
the  approach  is  still  worth  considering  because  the  ap¬ 
proach  offers  an  opportunity  of  avoiding  attacks  while 
keeping  track  of  a  target  for  the  sensor.  In  this  sense, 
the  assumption  can  be  interpreted  as  the  sensor’s  pre¬ 
diction  of  the  target’s  movement.  In  a  broader  sense, 
the  target  here  can  also  represent  an  uncertain  area  to 
be  searched,  and  the  ’’known  trajectory”  represents  the 
areas  of  the  highest  interest. 

The  paper  is  organized  as  follows.  In  Section  2,  an 
attack-avoidance  game  is  formulated  with  linear  dy¬ 
namics  and  a  quadratic  objective  based  on  soft  con¬ 
straints.  Equilibrium  strategies  of  the  players  are  de¬ 
rived  in  section  3.  An  implementation  scheme  is  then 
introduced  in  Section  4  to  fill  the  gap  between  the  LQ 
formulation  and  real-world  attack-avoidance  problems. 
In  Section  5,  we  evaluate  the  performance  of  the  pro¬ 
posed  strategies  by  simulations  and  comparisons  are 
drawn  with  the  existing  strategies.  Concluding  remarks 
are  provided  in  Section  6. 

2  Linear  Quadratic  Formulation 
with  Soft  Constraints 

In  this  section,  we  formulate  the  attack-avoidance 
problem  using  soft  constraints  under  the  LQ  framework 
with  a  fixed  horizon.  Consider  a  sensor,  an  attacker 


and  a  target  in  an  ng-dimensional  space  S  C  Rns  with 
ns  G  N.  Let  xs  £  R""5,  xa  £  R™“  and  xt  £  R™*  be 
the  state  variables  of  the  sensor,  the  attacker  and  the 
target  respectively,  with  ns,na,nt  >  ng.  Suppose  that 
each  player  in  the  game  has  its  independent  dynam¬ 
ics,  which  is  described  by  the  following  linear  equations 
respectively. 


xs(t)  =  Asxs(t)  +  B'sus(t)  with  xs(t0)  =  xs0  (la) 
xa(t)  =  Aaxa(t)  +  B'aua(t)  with  xa(t0)  =  xa0  (lb) 
xt(t)  =  Atxt(t )  +  B'tut(t)  with  xt(t0)  =  xt0  (lc) 

Here,  xs(t)  £  R"s,  xa{t)  €  R"“  and  xt(t)  £  Rra*  for 
t  >  to;  us(t)  £  Us,  Ua(t)  £  Ua  and  ut(t)  £  Ut  are  con¬ 
trol  inputs;  As,  Aa,  At,  B's,  B'a,  B't  are  real  matrices  with 
proper  dimensions.  Suppose  that  the  first  ng  elements 
of  xs  (xa,  Xt)  stand  for  the  physical  position  of  the  sen¬ 
sor  (attacker,  target)  in  S.  We  can  define  a  projection 
operator  P  :  R"s  i— >  S  for  the  sensor  as 

P{xs)  =  [a;si,  •  -  -  ,xs„s]T  G  S.  (2) 

That  is,  P{xs)  gives  the  sensor’s  position  in  S.  Similar 
operators  can  also  be  defined  for  both  the  attacker  and 
the  target,  and  here  we  use  the  same  notation  P. 

For  simplicity,  we  use  the  following  aggregate  dy¬ 
namic  equation. 


x(t)  =  Ax(t)  +  Bsus(t)  +  Baua(t),  (3) 

where 


xs 

'As  0 

r  b'  i 

X  = 

.  X a  . 

,A  = 

0  Aa  _ 

,BS  = 

S 

0 

Define  x  =  [xj,  xj,  xJ]T  £  R"  with  n  =  ns  +  na  +  fit- 
We  assume  that  each  player  can  access  the  state  x  at 
any  time  t,  and  in  this  paper,  feedback  strategies  are 
considered.  Let  7S  :  R"  x R  i— >  Us  and  ya  :  Rn  xR  n  (7a 
denote  the  strategy  of  the  sensor  and  the  attacker  re¬ 
spectively.  Given  x  £  R"  and  time  0  <  t  <  T, 
7s(x,t)  £  Ua,  7 a(x,t)  £  Ua.  Denote  by  rs,  Ta  the 
set  of  admissible  feedback  strategies  for  each  player. 

We  consider  the  objective  functional  of  the  following 
form. 


J{ ls,la,X0)  =  J  (us(t)TUs(t)  -  uJ(T)ua{r) 

+WIs\\P(xs(T))~P(xt(T))\\2 
-wIa\\P(xa{T))  -  P(xs(r))||2)dt 

+ws\\P(xs(T))  -  P(xt(T))f 
-wa\\P(xa(T))  -  P(xs(T)) ||2 


(4) 


In  (4),  7S,  7 a  are  feedback  strategies;  us,  ua  are  the  con¬ 
trol  inputs  associated  with  each  corresponding  strategy; 
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wa  >  0,wa  >  0,  Wg  and  wIa>  0  are  weighting  scalars  as¬ 
sociated  with  the  relevant  costs  induced  by  the  distance 
between  the  sensor  and  the  target  and  that  between  the 
attacker  and  the  sensor.  In  this  formulation,  the  dis¬ 
tance  between  the  attacker  and  the  sensor  is  used  as  a 
penalty  term  to  approximate  the  “hard  constraint”  of 
the  problem,  which  mandates  that  the  sensor  stay  out 
of  reach  of  the  attacker.  The  use  of  a  penalty  term  is 
a  common  approach  in  optimal  control  and  differential 
game  theory  to  deal  with  hard  constraints,  especially 
under  the  LQ  framework  [9].  Note  that  a  penalty  on 
the  distance  between  the  sensor  and  the  target  is  also 
included,  which  enables  the  sensor  to  closely  track  the 
target  (while  avoiding  the  attacker)  under  the  result¬ 
ing  strategy.  The  fixed  time  duration  T  and  scalars 
ws,wa,wl,Wp  are  design  parameters,  and  their  values 
are  case  dependent. 

The  objective  J  in  (4)  can  be  rewritten  in  a  quadratic 
form  with  respect  to  x,  us  and  ua,  he., 


J(Ts,  7a5  %o)  =  J  (us(t)tus(t)  -  ul(r)ua{r) 
+xT  (t)Qx(t)^  dt  +  xT  (T)Q  fx(T),  (5) 


meaning  of  this  notation  is  only  true  in  this  section. 
The  objective  function  is  given  as 

J  =  J  (ui(t)tUi(t)  -  wJ(t)u2(t) 

+xT  (t)Qx(t)^  dr  +  xT(T)Qfx(T).  (8) 

The  following  LQ  theorem  specifies  saddle-point  equi¬ 
librium  feedback  strategies  for  both  players. 

Theorem  1.  The  game  with  players’  dynamics  in  (7) 
and  the  objective  J  in  (8)  admits  a  feedback  saddle-point 
solution  given  by  u*(f)  =  7i(a '{t),t)  =  K((t)x(t)  and 
u*2(t)  =  72 =  Ktjftfxft)  with  Kf(t)  =  —B^Zft) 
and  K^{t)  =  BfiZ(t),  where  Z(t)  is  bounded,  symmet¬ 
ric  and  satisfies 

Z  =  -AtZ  -  ZA-Q  +  Z{BlB l  -  B2B?)Z 
with  Z{T)  =  Qf.  (9) 

Readers  can  refer  to  [4]  and  [10]  for  a  detailed  proof. 

3.2  Game  Solution  for  the  Attack- 
Avoidance  Problem 


where  Q  can  be  defined  through  mapping  Q 2:  Q  = 
Q2(wIs,w(l),  Qf  =  Q2(ws,wa),  where  Q2  :  K.  x  R  i-> 
R™xn  (n  =  ns  +  na  +  nt)  is  defined  in  (6)  below. 


Q2(ws,wa)  = 


(ws  -  Wa)I™fxrls 

Xna 

-^Cn,  ' 

V’a.Infxn „ 

-w  Ins 
UJa-Lnn.  Xn„. 

0na  xrit 

Ont  Xna 

In  (6),  /"°x„2  is  an  n\  x  n2  matrix,  in  which  the  first 
ns  rows  and  ns  columns  form  an  identity  matrix,  and 
the  rest  of  the  entries  are  zero. 

This  attack-avoidance  game  is  a  zero-sum  game, 
where  the  sensor  seeks  a  strategy  7S  £  Ts  to  minimize 
J  subject  to  (3),  while  the  attacker  tries  to  maximize 
J  with  7 a  £  Ta.  The  game  can  be  viewed  as  a  dual 
tracking  problem,  where  the  sensor  wants  to  track  the 
target  but  to  avoid  the  attacker,  and  at  the  same  time, 
the  attacker  needs  to  follow  the  sensor  closely. 


In  the  attack-avoidance  game,  we  consider  that  the 
target  that  moves  along  a  predetermined  trajectory 
Xt{-)  in  R"s.  The  movement  of  the  target  is  known 
to  both  the  sensor  and  the  attacker. 

Consider  the  dynamics  of  the  sensor  and  the  attacker 
in  (la)-(lb).  Note  that  in  (1),  the  target’s  control 
is  known  (with  a  known  trajectory),  and  the  game  is 
played  between  the  sensor  and  the  attacker.  By  in¬ 
spection  of  the  objective  (4),  we  find  that  an  attack- 
avoidance  game  with  an  arbitrarily  moving  target  is 
closely  related  to  a  LQ  regulator  problem  with  a  ref¬ 
erence  state  trajectory  [11].  In  what  follows,  we  make 
use  of  this  analogy  to  solve  the  game.  The  following 
theorem  provides  saddle-point  strategies  of  the  players. 

Theorem  2.  Suppose  that  the  target  trajectory  Xt{t)  is 
known.  The  attack- avoidance  game  with  the  dynamics 
in  (la)  and  (lb)  and  the  objective  J  in  (j)  admits  a 
feedback  saddle-point  solution  under  the  strategies 


3  Game  Solution  for  the  Attack- 
Avoidance  Problem 
3.1  Review  of  LQ  Game  Theory 

We  first  introduce  players’  saddle-point  equilibrium 
strategies  in  a  two-player  LQ  different  game.  This  will 
be  the  major  tool  that  we  rely  on  in  this  paper.  Let  us 
consider  a  game  involving  two  players  with  the  following 
linear  dynamics 

x(t)  =  Ax(t)  +  Biu\(t)  +  B2u2(t).  (7) 


<  =  1*s{x,t)  =  -BjZ11x-Bjb-  (10) 

K  =  1*a(x,t)  =  B^Z11x  +  B^b,  (11) 

where  BSlBa,  x  are  defined  in  (3);  the  hx  h  (h  =  ns  + 
na)  matrix  Z\\  are  bounded  and  satisfies 

i?ii  +  AT Z\\  +  ZnA  +  Qn  —  Z\i(BsBj 

~BaBa)Zn  =  0,  with  Zn(T)  =  Q/lr  (12) 

Here  Qu,  Qri,  Q/11?  <5/12  are  the  corresponding  sub¬ 
matrices  of  the  matrices  Q  and  Q /  partitioned  as 


Note  that  here  x  is  the  state  variable  in  this  game,  and 

Q  = 

Q  n 

Q 12 

and  Qf  = 

Qfn 

Qf  12 

different  from  aggregate  state  x  defined  above.  The 

Q 12 

Q22 

.  Qf  12 

Q f  22  . 
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with  Q  is  defined  in  (5).  Matrices  Qn  and  Qfn  are  hx 
h  matrices;  A  is  given  in  (3);  the  time-varying  vector  b 
is  specified  by 

b(t)  =  [~AT  +  Zn(BsBj  -  BaBj)]  b(t)-Q12xt  (13) 
with  b{T)  =  Qf12xt(T). 

Proof.  We  use  Theorem  1  to  prove  the  theorem.  For  the 
time  being,  we  temporarily  assume  that  the  trajectory 
of  the  target  xt(-)  is  generated  by  an  autonomous  linear 
system  (without  control)  as 

xt  =  Atxt  with  xt0  given.  (14) 

Later,  we  will  show  that  this  assumption  is  not  neces¬ 
sary. 

Combining  the  dynamic  equations  (la)-(lb)  with 
(14),  we  can  write  an  aggregate  dynamic  equation  as 

x(t)  =  Ax{t)  +  Bsus(t)  +  Baua(t),  (15) 

where  x,A,Bs,Ba  are  defined  as 


xs 

'  As 

0 

0 

Xa 

,A  = 

0 

A0 

0 

Xt 

0 

0 

At 

'  B'  - 

0 

Bs  = 

0 

and  Ba  = 

B’a 

0 

0 

The  objective  is  still  the  same  as  (5). 

For  a  game  with  the  objective  in  (5)  and  the  players’ 
dynamics  in  (15),  Theorem  1  is  applicable.  That  is,  if 
the  following  Riccati  equation 

Z  =  ~AtZ  -  ZA-Q  +  Z{BsBj  -  BaBj)Z 

with  Z(T)  =  Qf  (16) 

admits  a  solution  Z  over  the  interval  [0,  T],  the  saddle- 
point  strategies  of  the  sensor  and  the  attacker  are  given 

by 

K(t)  =  ~BjZ(t)x(t)  and  u*a(t)  =  BjZ(t)x(t).  (17) 

In  (17),  Z  is  a  n  x  n  matrix  (n  =  ns  +  na  +  nfi),  and  we 
now  partition  Z  in  the  following  way, 


Here,  Z n  is  an  ni  x  n\  matrix  with  m  =  ns  +  na;  Z i2 
is  an  n\  x  n*  matrix;  and  accordingly,  Z22  is  an  n*  x  nt 
matrix.  Matrices  Q  and  Qf  can  also  be  partitioned  in 
the  same  way  into  submatrices  Qij  and  Qffj  with  the 
same  dimensions  of  Zi:)  (■ i,j  e  {1,2}).  Note  the  dif¬ 
ference  between  A  here  and  A  in  (3)  as  well  as  those 
between  BSlBa  and  Bs,Ba.  With  the  submatrices  de¬ 
fined  above,  the  Riccati  equation  (16)  can  be  presented 
separately  in  terms  of  Z;tjj ,  Qi3  and  Qf  .^  as 

Zn  +  AA  Z\  i  +  Z\\A  +  Q  (18) 

—Zu{BsBJ  —  BaBj)Zn  =  0,  Zn(T)  =  Qfn; 


Zi2  +  Z\2At  +  AT  Zi2  +  Qi2  (19) 

—Zh(BsBJ  —  BaBj)Z\2  =  0,  Z\2(T)  =  Qf12; 

Z22  +  Z22At  +  Aj  Z22  +  Q22  (20) 

-Zv2(BsBj  -  BaBj)Z12  =  0 ,Z22(T)  =  Qf2T 

The  advantage  of  partitioning  the  Riccati  equation  in 
this  way  is  that  the  saddle-point  strategy  of  the  sensor 
(or  the  attacker)  can  be  decomposed  into  two  parts. 
Note  that  xT  =  [xT,xJ].  Accordingly,  the  sensor’s  op¬ 
timal  control  in  (17)  can  also  be  written  as 

«:  =  -BjZ-nx  -  BjZ12xt.  (21) 

Next,  we  define  b  =  Z\2Xt ■  Take  the  time  derivative 
of  b.  Based  on  Z i2  in  (19),  we  obtain  the  differential 
equation  of  b(t)  below. 

b(t)  =  —  (Zi2xt)  =  Zi2xt  +  Zi2xt 
at 

=  Z12(Atxt)  —  (Z12At  +  ATZ12  +  Qi2)xt 
+Zn(BsBj  —  BaBj)Zi2Xt 
=  [— AT  +  Zh(BsBJ  —  BaBj)]  Z\2xt  —  Qi2%t 

=  [~AT  +  Z^BsBj  -  BaBj)\  b(t)  -  Q12xt 

(22) 

The  initial  conditions  for  equation  (22)  is  b(T)  = 
Zi2 (T)xt(T)  =  Qf12xt(T )  can  be  easily  derived.  Here, 
b(t)  can  be  completely  determined  by  solving  (22). 
Thus,  the  saddle-point  strategy  in  (21)  is 

u*  =  -BjZux-B]b.  (23) 

By  inspection  of  (23)  and  (22),  the  saddle-point  equi¬ 
librium  strategy  actually  does  not  depend  on  the  as¬ 
sumption  of  the  target’s  linear  dynamics  given  in  (14). 
Finally,  the  saddle-point  equilibrium  strategy  of  the  at¬ 
tacker  u*  in  (17)  can  be  further  derived  to  obtain 

u*a  =  B?Z11x  +  B?b. 


Remark  1.  The  theorem  can  also  be  proved  by  solving 
the  corresponding  Hamilton-Jacobi-Isaacs  equation  as¬ 
sociated  with  the  game  without  the  auxiliary  assumption 
in  (14). 

In  Theorem  2,  the  saddle-point  strategy  of  the  sen¬ 
sor  (or  the  attacker)  has  two  terms.  The  first  term  is 
feedback  that  depends  on  the  state  variables  of  both 
the  attacker  (or  the  sensor)  and  itself.  This  part  is  the 
game  strategy  that  is  coupled  with  the  attacker’s  (sen¬ 
sor’s)  reaction.  The  second  term  in  is  a  feedforward 
term  that  solely  depends  on  the  target’s  motion. 
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4  On  Implementation  of  the  LQ 
Strategies  in  Practice 

The  discussion  under  the  framework  of  LQ  game 
above  takes  advantage  of  the  availability  of  analytical 
solutions.  However,  its  usefulness  remains  to  be  tested. 
The  gap  between  the  LQ  game  approach  and  a  real- 
world  attack-avoidance  game  lies  in  the  fixed  terminal 
time  T  in  the  formulation,  which  imposes  a  soft  con¬ 
straint  on  distances.  To  demonstrate  the  usefulness  of 
the  LQ  formulation  in  practice,  we  propose  a  sequential 
implementation  scheme  as  follows. 

We  choose  At  >  0  as  the  sampling  time  interval.  At 
each  sampling  time  tfc  =  to  +  kAt  for  k  G  {0, 1,  2,  •  •  •  }, 
saddle-point  equilibrium  strategies  7*,  7*  are  solved 
over  the  interval  [tfc, tfc  +  Tfc],  where  Tfc  >  At  is  the 
optimization  horizon  used  in  the  quadratic  objective 
(4).  We  will  discuss  shortly  the  choice  of  Tfc  and  the 
related  issue  about  the  existence  of  solutions  for  the 
corresponding  Riccati  equation.  The  game  strategies 
7* ,  7*  are  implemented  for  only  the  next  At  inter¬ 
val,  i.e.,  [tfc, tfc  +  At).  At  the  following  sampling  time 
tfc+At,  the  same  procedure  is  repeated.  We  call  this  im¬ 
plementation  scheme  LQ  Receding  Horizon  Algorithm 
(LQRHA).  The  detailed  calculation  at  each  time  tfc  is 
given  in  Table  1,  where  ws ,  wa ,  ,  w!a  are  the  design 
parameters. 


corresponding  matrices  in  (9).  Moreover,  if  X(-)  is  in¬ 
vertible,  Z(t)  =  Y(t)X~1(t)  is  a  solution  of  (9). 

Refer  to  [10],  pp.  194  or  [12],  pp.  354  for  a  proof. 

According  to  Theorem  3,  we  define  a  finite  escape 
time  Te  >  0  (if  it  exists)  such  that  T  —  Te  is  the  small¬ 
est  time  such  that  the  matrix  X(T  —  Te)  at  time  T  —  Te 
is  singular2.  The  escape  time  can  help  determine  the 
optimization  horizon  Tfc.  Suppose  that  we  know  how 
to  choose  the  optimization  horizon  Tfc  (a  design  vari¬ 
able  in  the  LQ  design  approach)  based  on  the  system 
states  without  considering  the  existence  of  solutions  for 
the  Riccati  equation,  e.g.,  Tfc  =  T(xfc).  Then,  by  solv¬ 
ing  the  linear  differential  equation  in  (24),  it  may  be 
checked  whether  Te  €  [0, Tfc].  If  Te  £  [0,Tfc],  then 
Tfc  can  be  chosen  as  Tfc;  otherwise,  Tfc  can  be  set  as 
Tfc  =  Te  —  S  for  some  <5  >  0.  With  Tfc  chosen  in  this 
way,  the  Riccati  equation  in  (9)  is  guaranteed  to  have 
a  bounded  solution  over  [0,Tfc].  Here,  Te  only  needs  to 
be  calculated  once  because  the  equation  (9)  is  not  state 
dependent.  On  the  other  hand,  we  need  to  choose  a 
proper  sampling  time  At  such  that  Tfc  >  At. 

5  A  Numerical  Example 

In  this  section,  we  demonstrate  the  usefulness  of  the 
LQ  strategies  by  solving  a  selected  attack-avoidance 
game  in  R2. 


Table  1:  Procedure  at  Each  tfc  in  the  LQRHA 


1.  Input:  state  x  at  time  tfc 

2.  Obtain  the  parameters  ws,wa  and  Tfc 

3.  Solve  the  saddle  equilibrium  feedback  strategies 
7* ,  7*  over  the  time  interval  [tfc, tfc  +  Tfc) 

4-  Output:  7* ,  7: _ 


We  now  discuss  how  to  choose  a  proper  Tfc,  such 
that  the  corresponding  Riccati  equation  (9)  admits  a 
bounded  solution  on  [0,  Tfc],  or  in  other  words,  the  inter¬ 
val  [0,  Tfc]  contains  no  escape  time  [12].  A  finite  escape 
time  (if  it  exists)  of  a  Riccati  equation  can  be  deter¬ 
mined  in  the  way  suggested  by  the  following  theorem. 
Note  that  since  the  problem  here  is  time  invariant,  the 
existence  of  solutions  over  [0,  Tfc]  is  essentially  the  same 
as  that  over  [tfc,  tfc  +  Tfc]  regardless  of  tfc. 

Theorem  3.  The  Riccati  Differential  Equation  (RDE) 
(9)  has  a  bounded  solution  over  [0,  T]  if  and  only  if  the 
following  matrix  linear  differential  equation 


■  X(t)  - 

A 

-S 

-  x(t)  - 

.  Y(t)  _ 

-Q 

-AT 

Y(t)  _ 

X{T)  - 

'  In  ' 

Y(T)  _ 

Qf 

has  a  solution  on  [0,  T]  with  X(-)  nonsingular  over 
[0,  T] .  In  (24),  A,Q  and  S  =  B\Bf  —  B2Bf  are  the 


5.1  Players  with  Simple  Motion 

Suppose  that  the  sensor  and  the  attacker  have  the  fol¬ 
lowing  simple  motion  dynamics  in  R,  which  are  given 
in  an  x-y  coordinate  as 

f  is  =  vsuscos(9s)  f  fca  =  vaua  cos(6»a)  ,  , 

\  ts  =  Vsus  sin(0s)  ’  \  ija  =  Vaua  sin(0a) 


and  the  initial  states  are  known.  Define  27  =  [ x^,y^]T 
as  an  aggregate  state,  and  the  subscript  g  G  {s,  a} 
stands  for  sensor  or  attacker.  In  (25),  xq,yq  are  the 
displacements  along  the  x  and  y  axis;  vf  is  the  speed, 
which  is  a  constant;  0q  are  the  control  inputs,  where 
G  [0, 1]  is  a  scalar  that  determines  the  player’s  mov¬ 
ing  speed  from  0  up  to  vq,  and  6q  is  the  moving  orienta¬ 
tion.  To  make  use  of  the  LQ  approach,  in  the  following, 
we  use  an  equivalent  dynamics  in  the  following  form. 

xs 

Vs 

ia 
_  fia 

In  (26),  (u^xjU^y)  are  the  control  inputs  with  the  con¬ 
straint  y / u?l  +  uffj  <  1.  Clearly,  iu^,9f)  and  (uss,u^) 
forms  a  one-to-one  mapping  with  9 s  G  [0,  2-7t). 

2Note  that  the  Riccati  equation  (9)  is  solved  backwards  since 
its  value  is  given  at  the  final  time  T.  Hence,  T  —  Te  is  used  here. 
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(26) 
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The  dynamics  in  (26)  are  linear  in  the  inputs  u ^  = 
Uc;y]T  but  with  an  additional  constraint  on  the 
boundedness.  We  still  rely  on  the  LQ  approach  to  de¬ 
sign  the  feedback  control  law  7-.  To  ensure  the  bound¬ 
edness,  the  following  nonlinear  function  </?(•)  is  used. 

ip(r)  =  <  I  I,  If  !!1  for  r  €  Km  with  m>  1 

l  r/llrl!  if  |M|  >  1 

(27) 

In  the  simulations,  the  actual  control  u ^  applied  is  u ^  = 

Phsix))- 

5.2  Attack  Avoidance  Game 

We  consider  an  attacker- avoidance  problem  with  a 
mobile  target  that  moves  along  a  specific  trajectory, 
which  is  known  to  both  the  sensor  and  the  attacker. 
The  movement  of  the  target  is  described  by  the  follow¬ 
ing  equation. 


(  xt  =  0.5f; 

\  Vt  =  —0.5 1  -  5sin(|t). 

The  speeds  and  the  initial  positions  of  the  players  are 
specified  in  Table  2.  The  units  of  the  parameters  can 
be  arbitrary  and  not  specified  here. 


Table  2:  Simulation  Parameters 


Sensor 

Attacker 

Target 

Speed 

1 

1 

Initial  Position 

(-9,  -4) 

(-9,  -9) 

(2,2) 

We  apply  the  LQRHA  algorithm  to  determine  both 
the  sensor’s  and  the  attacker’s  strategies.  Let  the  sam¬ 
pling  time  interval  At  =  0.1  second.  At  each  sampling 
time  tk  =  to  +  fcAt,  the  optimization  horizon  Tk  is  cho¬ 
sen  as  15  seconds  for  all  k.  The  parameters  in  the  ob¬ 
jective  functional  (5)  are  chosen  as  ws  =  =  10  and 

wa  =  u>i  =  100.  Here,  w £  =  10  indicates  that  the  dis¬ 
tances  between  players  are  much  more  important  than 
the  control  energy  needed  in  the  LQ  formulation.  The 
relative  numbers  between  wa,  and  ws ,  wl  are  chosen 
to  reflect  the  fact  that  avoidance  of  attacks  is  of  greater 
importance  than  tracking. 

Figure  1  depicts  the  players’  trajectories  under  the 
LQ  game  strategies.  Here,  the  arrows  indicate  the  in¬ 
stantaneous  moving  directions  of  the  trajectories  at  the 
end  of  the  simulation.  With  the  LQ  game  strategy, 
the  sensor  is  able  to  follow  the  target  and  stay  away 
from  the  attacker.  On  the  other  hand,  the  attacker  can 
closely  follow  the  sensor  and  well  position  itself  between 
the  sensor  and  the  target.  This  is  important  because 
the  attacker  may  lose  its  ability  of  reaching  the  sensor 
if  it  follows  the  sensor  too  closely,  and  we  will  see  the 
case  shortly  where  the  attacker  uses  other  strategies. 

Next,  we  compare  the  LQ  game  strategies  with  al¬ 
ternative  tracking  strategies  (purely  designed  based  on 


Players’  Trajectories 


Figure  1:  Players’  Trajectories  with  the  LQ  Game 
Strategies 


tracking  problems)  adopted  by  both  the  sensor  and  the 
attacker  respectively.  In  each  case,  one  player  keeps  its 
current  LQ  game  strategy  unchanged  while  the  other 
player  switches  to  another  strategy.  Two  alternative 
strategies  are  considered,  and  both  are  well-known  navi¬ 
gation  strategies  [8] .  One  is  a  LQ  tracking  strategy  that 
is  directly  obtained  by  following  the  same  procedure 
described  in  this  paper,  i.e.,  by  solving  an  optimiza¬ 
tion  problem  with  the  objective  function  (4)  where  the 
weights  Wa,wa  (or  wl,ws)  are  set  zero.  With  the  zero 
weights  on  the  selected  penalty  terms,  it  is  no  longer 
a  game  problem  but  an  optimal  tracking  problem  be¬ 
tween  the  sensor  and  the  target  (or  the  attacker  and 
the  sensor).  The  other  strategy  to  be  considered  is  the 
so-called  Line-Of-Sight  (LOS)  strategy,  which  will  be 
specified  shortly.  Both  of  the  strategies  are  designed  to 
merely  track  an  object  of  interest.  Besides  the  changes 
in  the  players’  strategies,  all  other  parameters  of  the 
game  remain  the  same,  and  a  same  length  of  the  game 
time  duration  is  used  in  the  simulations. 

We  first  simulate  a  scenario  where  the  sensor  still 
uses  the  same  game  strategy  determined  earlier,  while 
the  attacker  switches  to  other  strategies.  The  game 
result  in  Figure  2  shows  the  case  where  the  attacker 
uses  the  LQ  tracking  strategy. 

On  the  other  hand,  a  possible  LOS  feedback  strategy 
of  the  attacker  is  defined  as  follows. 


u 


LOS 

a 


=  Va 


Xs 

11^ 


( Xg  7^  Xa ). 


Figure  3  illustrates  the  simulation  result  when  the  at¬ 
tacker  uses  this  LOS  strategy. 

In  the  both  cases  above,  without  prediction  of  the 
sensor’s  movement  based  on  the  motion  of  the  target, 
the  attacker  loses  its  capability  of  intercepting  the  sen¬ 
sor  considering  that  it  moves  at  the  same  speed  as  the 
sensor. 

Similar  comparisons  have  also  been  drawn  if  the  sen¬ 
sor  deviates  from  its  LQ  game  strategy.  At  this  time, 
the  attacker  adopts  the  same  LQ  game  strategy  while 
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Figure  2:  Players’  Trajectories  When  the  Attacker  Uses 
the  LQ  Tracking  Strategy 


Figure  4:  Players’  Trajectories  When  the  Sensor  Uses 
the  LQ  Tracking  Strategy 


Players’  T rajectories  Players’  Trajectories 


Figure  3:  Players’  Trajectories  When  the  Attacker  Uses 
the  LOS  Strategy 


Figure  5:  Players’  Trajectories  When  the  Sensor  Uses 
the  LQ  Tracking  Strategy 


the  sensor  uses  both  the  LQ  tracking  and  the  LOS  strat¬ 
egy.  The  game  results  are  plotted  in  Figure  4  and  Fig¬ 
ure  5  respectively.  The  LOS  strategy  for  the  sensor  is 
given  as 


u 


LOS 

a 


Xt  -  xs 

|| Xt  -  zs|| 


( Xg  7^  Xt)  . 


In  both  cases,  the  sensor  is  intercepted  by  the  at¬ 
tacker  within  the  simulation  time.  Without  considering 
the  attacker,  other  tracking  strategies  should  lead  to  a 
similar  result. 

Finally,  Figure  6  shows  the  players’  trajectories  when 
the  sensor  implements  an  escaping  strategy  that  is  de¬ 
termined  by  solving  the  same  game  problem  with  the 
weights  wl,ws  =  0  in  the  objective  function  (4).  Note 
that  since  ,  ws  =  0,  i.e.,  with  no  penalties  on  tracking 
the  target,  the  main  objective  of  the  sensor  is  to  escape 
from  the  attacker.  Here,  the  attacker  uses  the  same  LQ 
game  strategy. 

Based  on  the  simulation  examples  above,  it  is  clear 
that  the  LQ  game  design  provides  for  the  sensor  with 
a  better  compromised  strategy  between  tracking  and 
avoiding  attacks.  From  the  sensor’s  perspective,  imple¬ 
menting  both  pure  target  tracking  and  escaping  strate¬ 
gies  has  obvious  disadvantages.  Each  represents  an  ex¬ 
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Figure  6:  Players’  Trajectories  When  the  Sensor  Uses 
the  LQ  Tracking  Strategy 


treme  of  the  entire  spectrum  of  possible  strategies  from 
mere  escaping  to  tracking.  As  seen  in  the  simulations, 
without  considering  the  attacker,  the  sensor  under  a 
pure  tracking  strategy  is  likely  to  be  destroyed  by  the 
attacker.  On  the  other  hand,  with  an  escaping  strategy, 
tracking  of  the  target  has  been  given  up.  In  both  cases, 
the  mission  of  tracking  could  be  failed.  The  advantage 
of  the  LQ  game  approach  is  that  the  knowledge  or  a 
prediction  of  the  target’s  future  movement  provides  a 
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better  chance  for  the  sensor  to  avoid  possible  attacks 
while  keeping  the  track  of  the  target.  When  the  danger 
of  being  attacked  is  eliminated,  the  sensor  is  still  in  a 
good  position  for  tracking  tasks  under  normal  condi¬ 
tions. 

Another  observation  is  that  the  LQ  game  strategy 
also  provides  a  better  attacking  strategy  for  the  at¬ 
tacker.  From  the  simulations,  blindly  going  after  the 
sensor  can  cost  the  attacker  the  chance  of  intercept¬ 
ing  the  sensor.  The  game  strategy  somehow  predicts 
the  sensor’s  movement  and  better  aligns  the  attacker’s 
movement  with  the  sensor  and  the  target  in  this  three- 
entity  game  situation. 

Based  on  a  number  of  simulations,  it  is  clear  that 
the  LQ  strategy  with  the  LQRHA  implementation  can 
provide  fairly  good  guidance  laws  for  both  the  sensor 
and  the  attacker. 

6  Conclusions 

In  this  paper,  we  have  studied  a  attack-avoidance 
problem  under  the  framework  of  a  LQ  game  formula¬ 
tion.  From  a  practical  point  of  view,  inherent  hard  con¬ 
straints  have  been  approximated  and  replaced  by  the 
soft  constraints  with  a  fixed  optimization  horizon.  We 
have  derived  equilibrium  strategies  for  both  the  sensor 
and  the  attacker.  For  implementation,  a  receding  hori¬ 
zon  algorithm  called  LQRHA  has  been  proposed  for  ap¬ 
plication  of  the  LQ  strategies.  Simulations  have  shown 
that  this  LQ  game  design  can  successfully  provide  for 
the  sensor  with  a  better  compromised  strategy  between 
tracking  and  avoiding  attacks,  for  which  a  traditional 
design  can  fail.  Overall,  the  LQ  strategies  based  on 
the  LQRHA  implementation  can  provide  good  control 
guidance  laws  for  both  players  in  this  problem. 

The  main  limitation  of  the  approach  is  the  assump¬ 
tion  that  the  trajectory  of  the  target  is  known  to  both 
the  sensor  and  the  attacker.  In  practice,  this  assump¬ 
tion  can  be  interpreted  as  the  sensor’s  prediction  of  the 
target’s  movement.  In  a  broader  sense,  the  target  here 
can  also  represent  an  uncertain  area  to  be  searched,  and 
the  ’’known  trajectory”  may  represent  the  areas  of  the 
highest  interest. 
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